#Install and load required packages
library(ggplot2)
library(plotly)
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
library(ggcorrplot)
library(leaflet)
Colchester, a vibrant town with a rich historical history, also faces crime-related issues. To establish effective crime prevention and intervention measures, policymakers, law enforcement agencies, and the community must first understand the patterns and trends in crime data. The dataset being examined contains a wide range of information, from crime incidence details to meteorological conditions collected throughout the year. Through meticulous analysis and visualization, we seek to uncover patterns, trends, and relationships within the data that can offer valuable perspectives on crime dynamics and potential factors influencing criminal activities.
The dataset on crime in Colchester offers a comprehensive overview of criminal activities reported throughout the year. It covers a wide range of categories, including anti-social behavior, burglary, vehicle crime, shoplifting, and drug-related offenses, among others. Each entry in the dataset provides detailed information such as the type of crime, its location, date of occurrence, and outcome status. On the other hand, the temperature dataset complements the crime data by providing meteorological information recorded in Colchester during the same time frame. This data set includes various parameters like temperature, precipitation, wind speed, visibility, and atmospheric pressure. Weather data is typically collected at regular intervals, such as hourly or daily, enabling a thorough examination of seasonal variations and weather patterns.
We’re getting ready to study the crime and temperature data by combining them and removing unnecessary column for analyses. This helps us focus on the important information that will help us understand how crime and weather relate in Colchester.
# Set the working directory to the location of the data files
setwd("C:/Users/admin/OneDrive/Desktop/data visulisation")
# Read the crime data from the CSV file into the 'crimedata' data frame
crimedata<-read.csv('crime23.csv')
# Read the temperature data from the CSV file into the 'tempdata' data frame
tempdata<-read.csv("temp2023.csv")
# Extract the date information in YYYY-MM format from the 'Date' column in 'tempdata'
tempdata$Date<-substr(tempdata$Date, start=1,stop=7)
# Rename the 'Date' column to 'date' in 'tempdata' for consistency
colnames(tempdata)[which(names(tempdata) == "Date")] <- "date"
# Remove unnecessary columns from 'crimedata'
crimedata <- crimedata[, -which(names(crimedata) == "context")]
# Remove unnecessary columns from 'tempdata'
tempdata <- tempdata[, -which(names(tempdata) == "PreselevHp")]
tempdata <- tempdata[, -which(names(tempdata) == "SnowDepcm")]
tempdata <- tempdata[, -which(names(tempdata) == "WindkmhDir")]
tempdata<- tempdata[,-which(names(tempdata) == "SunD1h")]
# Summarize numeric columns in 'tempdata'
tempdata_new <- tempdata %>%
group_by(date) %>%
summarise(across(where(is.numeric),~mean(.x, na.rm=TRUE)))
# Merge the crime data with the summarized temperature data
combined_df <- merge(x = crimedata, y = tempdata_new, by = "date", all.x = TRUE)
# Create a two-way frequency table to analyze the distribution of categories over months
twowaytable<-table(combined_df$category,combined_df$date)
print(twowaytable)
##
## 2023-01 2023-02 2023-03 2023-04 2023-05 2023-06 2023-07
## anti-social-behaviour 46 49 21 53 67 52 76
## bicycle-theft 20 14 19 16 16 14 15
## burglary 17 22 14 22 15 26 14
## criminal-damage-arson 59 37 52 63 64 42 42
## drugs 14 17 21 21 22 15 17
## other-crime 7 5 6 15 3 11 12
## other-theft 48 37 35 38 42 41 51
## possession-of-weapons 3 3 11 5 7 3 8
## public-order 45 42 58 51 37 36 40
## robbery 8 7 8 7 7 17 6
## shoplifting 76 31 51 40 51 59 33
## theft-from-the-person 6 7 12 7 5 6 9
## vehicle-crime 65 15 21 29 24 45 25
## violent-crime 237 181 226 207 226 196 236
##
## 2023-08 2023-09 2023-10 2023-11 2023-12
## anti-social-behaviour 71 90 68 39 45
## bicycle-theft 21 37 26 27 10
## burglary 20 18 31 11 15
## criminal-damage-arson 33 47 45 53 44
## drugs 7 25 19 13 17
## other-crime 9 7 6 5 6
## other-theft 41 34 49 37 38
## possession-of-weapons 5 8 6 8 7
## public-order 41 45 52 45 40
## robbery 5 8 9 5 7
## shoplifting 57 33 43 39 41
## theft-from-the-person 5 7 3 4 5
## vehicle-crime 16 20 26 56 64
## violent-crime 219 263 209 221 212
From the above table the Violent crime happens a lot all year round, peaking in September at 263 incidents and dropping in February to 181. Property crimes like burglary, theft from people, and vehicle crime also happen often. Shoplifting has a big spike in Jan with 76 incidents. Some crimes change with the seasons. For example, bicycle theft goes up in September with 37 incidents, while anti-social behavior is more common in November with 90 incidents. This helps us know when and where to work on preventing crime.
# Create a bar plot to visualize the total frequency of crime categories
ggplot(combined_df, aes(x = category)) +
geom_bar(fill = "blue") + # Adding bars with blue color
# Add total frequency exactly above each bar
geom_text(stat = 'count', aes(label = after_stat(count)), vjust = -0.2, position = position_dodge(width = 1)) +
labs(title = "Analysis of Crime Category Frequencies", # Adding title, x-axis label, and y-axis label
x = "Crime Category",
y = "Frequency") +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) # Rotate x-axis labels for better readability
The Bar graph displays how many crimes were reported in different types.
Violent crime is the most common, with 2633 incidents. Anti-social
behavior comes next with 677 incidents, then criminal damage/arson with
581, followed by other theft with 491, and burglary with 406. Possession
of weapons is reported the least, only 74 times. Other crime is reported
92 times, and theft from the person is reported 76 times, making them
relatively rare.
##density plot##
# Create a density plot of average temperature
density_plot <- ggplot(combined_df, aes(x = TemperatureCAvg)) +
geom_density(alpha = 0.6) +
labs(title = "Density Plot of Average Temperature",
x = "Average Temperature (°C)",
y = "Density") +
theme_minimal()
# Convert ggplot object to plotly object
plotly_density <- ggplotly(density_plot)
# Display the interactive plot
plotly_density
The graph shows the average temperature ranging from 5 to 17.5 degrees Celsius on the x-axis. At approximately 6.8 degrees Celsius, the density is 0.15. The peak density is at 17 degrees Celsius, with a value of 0.17. The lowest density, 0.005, is at 9.8 degrees Celsius. Another notable peak is at 12 degrees Celsius, where the density is 0.06.
##violin plot##
# Creating the violin
v_plot <- ggplot(combined_df, aes(x = category, y = TemperatureCAvg)) +
geom_violin(trim = FALSE) +
labs(title = "Violin Plot of average temperature by Crime Category",
x = "Crime Category",
y = "Average Temperature") + # Set titles for plot and axes
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) # Rotate x-axis labels
# Make the plot interactive using plotly
v_plot <- ggplotly(v_plot)
# Display the interactive plot
v_plot
The violin plot illustrates a strong correlation between the average temperature and the frequency of crimes across different categories. In this visualization, each “violin” represents a distinct crime category, with its width reflecting the frequency of incidents at different temperature levels. Wider sections indicate a higher incidence rate at that particular temperature, whereas narrower sections suggest fewer incidents. By comparing the temperature distributions across various crime categories, this visualization enables us to discern any discernible patterns or irregularities. For instance, it helps identify whether certain types of crimes tend to occur more frequently at either higher or lower temperatures, offering valuable insights into potential correlations between temperature and criminal activity.
# Creating an interactive scatter plot with points colored by average temperature
temp_humidity_plot <- ggplot(combined_df, aes(x = TemperatureCAvg, y = Precmm,color = TemperatureCAvg)) +
geom_point() +
labs(title = "Scatter Plot of Temperature vs. Precipitation",
x = "Temperature (C)",
y = "Precipitation (mm)",
color = "avg temp") +
theme_minimal()
# Convert ggplot object to Plotly object
temp_humidity_plotly <- ggplotly(temp_humidity_plot)
# Display the interactive scatter plot
temp_humidity_plotly
The scatter plot displays precipitation (mm) on the y-axis and temperature (Celsius) on the x-axis. It reveals that as temperature increases, so does precipitation. For instance, when the temperature is low, the precipitation is also low, whereas higher temperatures correlate with higher precipitation levels. At a temperature of 6°C, the precipitation is at its lowest, around 0.07 mm, while at 10.55°C, the precipitation increases to 3.19 mm.
# Createing the scatter plot with a smoothing line
scatter_plot <- ggplot(combined_df, aes(x = VisKm, y = Precmm)) +
geom_point(aes(color = Precmm), alpha = 0.6) + # Points colored by precipitation amount
geom_smooth(method = "loess", se = FALSE, color = "blue") + # LOESS smoothing line without confidence interval
labs(title = "Scatter Plot of Visibility vs. Precipitation with Trend Line",
x = "Visibility (Km)", y = "Precipitation (mm)") +
scale_color_gradient(low = "skyblue", high = "darkblue") + # Color gradient from light to dark blue
theme_minimal() +
theme(legend.position = "none") # Hide legend for clarity
# Display the scatter plot
scatter_plot
## `geom_smooth()` using formula = 'y ~ x'
The scatter plot with visibility (km) on the x-axis and precipitation
(mm) on the y-axis, complemented by a trend line. It suggests a negative
correlation between visibility and precipitation, indicating that as
precipitation intensifies, visibility tends to diminish. This
relationship is plausible as precipitation like rain or snow can obscure
the air, reducing long-distance visibility. Additionally, some outliers
are noticeable in the plot, suggesting instances where the relationship
between visibility and precipitation may deviate from the overall
trend.
## Correlation analysis##
# Select the numeric variables you want to include in the correlation analysis
cor_variables <- c("TemperatureCAvg", "Precmm", "WindkmhInt", "PresslevHp",
"TdAvgC", "HrAvg", "WindkmhGust", "TotClOct")
# Compute the correlation matrix
corr_matrix <- cor(combined_df[, cor_variables])
# Create a correlation plot to visualize the relationship between variables
corr_plot <- ggcorrplot(corr_matrix, hc.order = TRUE, lab = TRUE)
# Display the plot
corr_plot
A correlation plot visually represents the strength of connections between pairs of variables in a dataset. Each variable is positioned along both the x and y axes, with the correlation coefficient between them shown as either a color gradient or numeric values within cells. A positive correlation, indicated by values closer to +1, suggests that as one variable increases, the other tends to increase as well. Conversely, a negative correlation, depicted by values nearer to -1, implies that as one variable increases, the other typically decreases. A correlation close to zero indicates a weak linear relationship between the variables. It’s important to note that the correlation between identical variables is always high, at +1. For example, the correlation between TemperatureCAvg and PresslevHp is nearly insignificant, at approximately 0.03.
##time series##
# Prepare the date column for time series analysis
combined_df$newdate <- paste(combined_df$date, "-01", sep = "")
# Convert 'date' column to Date type
combined_df$newdate <- as.Date(combined_df$newdate, format = "%Y-%m-%d")
# Create a time series plot to visualize multiple weather variables over time
time_series <- ggplot(combined_df, aes(x = newdate)) +
# Each line represents a different weather variable over time
geom_line(aes(y = HrAvg, color = "Hourly Avg")) +
geom_line(aes(y = Precmm, color = "Precipitation")) +
geom_line(aes(y = VisKm, color = "Visibility")) +
geom_line(aes(y = WindkmhGust, color = "Wind Gust")) +
labs(title = "Time Series Plot of Weather Variables",
x = "Date", y = "Value", color = "Variable") +
scale_color_manual(values = c("Hourly Avg" = "blue", "Precipitation" = "red", "Visibility" = "green",
"Wind Gust" = "orange")) +
theme_minimal()
# Display the time series plot
print(time_series)
The time series plot showcases the diverse weather parameters throughout
the duration of 2023. Each line within the plot denotes a distinct
weather metric like Hourly Average, precipitation, Visibility, and Wind
Gust, tracked across the months from January to December. Through this
visualization, one can discern the seasonal fluctuations inherent in
each variable, including temperature highs and lows, and observe their
interrelation over the course of the year. By unveiling these seasonal
patterns and exploring the correlations between different weather
indicators, the plot provides valuable insights into the dynamic nature
of weather phenomena throughout the year 2023.
##leaflet map##
# Convert 'category' to factor
combined_df$category <- factor(combined_df$category)
# Define a vector of 14 distinct colors
colors <- c("darkblue", "lightblue", "darkgreen", "lightgreen", "violet",
"darkorange", "pink", "red", "yellow", "purple",
"cyan", "brown", "gray", "orange")
# Create a leaflet map
crime_data_map <- leaflet(combined_df) %>%
addTiles() %>% # Add default OpenStreetMap tiles
addCircleMarkers(
lng = ~long, # Longitude
lat = ~lat, # Latitude
color = ~colors, # Color by crime category
popup = ~category, # Popup text
radius = 3, # Marker radius
fillOpacity = 0.7 # Marker fill opacity
) %>%
addLegend(
position = "bottomright", # Position of the legend
colors = colors, # Assign 14 different colors
labels = levels(combined_df$category) # Labels for legend
)
# Display the map
crime_data_map
The map showcases different crime categories in colchester such as anti-social behavior, bicycle theft, burglary, criminal damage/arson, drugs, other theft, possession of weapons, public order, robbery, shoplifting, theft from the person, vehicle crime, and violent crime. Each category is denoted by a distinct colored dot on the map.
Colchester, a lively town steeped in history, grapples with crime-related issues. To tackle these challenges effectively, it’s vital for policymakers, law enforcement, and the community to grasp crime data trends. Our analysis delves into Colchester’s 2023 crime data, focusing on monthly crime distributions.
The data shows that violent crime is a persistent issue, reaching its peak in September with 263 incidents. Property crimes like burglary and vehicle theft also occur frequently throughout the year. We notice seasonal variations, such as a rise in shoplifting in January and anti-social behavior peaking in November. These findings help us target resources strategically and focus on preventing crime. By concentrating on problem areas and taking proactive measures, we aim to reduce criminal activity and improve community safety. Collaboration and using data to make decisions are essential for effectively addressing these challenges. Our objective is to create a safer and stronger community for everyone in Colchester. Through innovation and involving the community, we work towards a better future and a thriving society.
The bar graph visually demonstrates crime prevalence, with violent crime topping the list at 2633 incidents. Anti-social behavior and criminal damage/arson follow closely, reflecting the multifaceted nature of local crime. Understanding these patterns helps policymakers prioritize resources effectively to combat crime and boost public safety.
The temperature density graph shows interesting patterns in how temperatures are spread out, which can help policymakers understand weather changes. The graph covers temperatures from 5 to 17.5 degrees Celsius and shows peaks and valleys that need a closer look. For example, the highest concentration of temperatures is around 17 degrees Celsius, suggesting this is a common weather range. On the other hand, there are fewer temperature readings around 9.8 degrees Celsius, indicating less typical weather patterns. By grasping these temperature fluctuations, policymakers can predict weather challenges more accurately and create specific plans to deal with them in communities.
The correlation plot gives policymakers a clear picture of how different factors in the dataset relate to each other. By using color gradients or numbers, policymakers can see how strongly variables are connected and in what direction. A positive correlation close to +1 means that when one variable goes up, the other tends to go up too, while a negative correlation near -1 means they move in opposite directions. When correlations are close to zero, it means there’s not much of a relationship between the variables. Policymakers need to know that identical variables always have a perfect positive correlation, while correlations close to zero, like the one between TemperatureCAvg and PresslevHp at around 0.03, show very little connection between them. Understanding these correlations helps policymakers pinpoint what might be causing certain issues and make smart decisions to address them effectively.
The map is a helpful tool for policymakers, showing where different types of crimes happen in Colchester. Each type of crime, like anti-social behavior or violent crime, is shown with a different colored dot on the map. This helps policymakers see where crimes are most common and where they need to focus their efforts. By studying this spatial data, policymakers can spot hotspots and trends in criminal activity, allowing them to target interventions and allocate resources where they’re needed most. Understanding the geography of crime helps policymakers create specific strategies to make Colchester safer and reduce crime in the area.
In conclusion, our detailed look at crime and weather data in Colchester offers valuable insights for policymakers, law enforcement, and the community. We used visual tools like bar graphs, density plots, scatter plots, and correlation plots to uncover important patterns and trends. The high occurrence of violent crime throughout the year shows the need for specific actions to tackle this ongoing problem. Also, the seasonal ups and downs in certain crimes like bicycle theft and anti-social behavior suggest the importance of adjusting strategies according to the time of year. Our study of weather data also found connections between weather conditions and crime, highlighting the need for a combined approach to crime prevention. By using insights from both crime and weather data, policymakers can create effective plans to keep communities safe and resilient. Overall, our research emphasizes the value of using data to make smart decisions in dealing with complex issues like crime, leading to safer neighborhoods in Colchester and beyond.
1.Wickham, H. (2016). ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. ISBN 978-3-319-24277-4, https://ggplot2.tidyverse.org.
2.R Core Team. (2023). R: A Language and Environment for Statistical Computing [Computer software]. R Foundation for Statistical Computing. https://www.R-project.org/.